NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Approximation trees: statistical reproducibility in model distillation

https://doi.org/10.1007/s10618-022-00907-3

Zhou, Yichen; Zhou, Zhengze; Hooker, Giles (January 2023, Data Mining and Knowledge Discovery)

Full Text Available
S-LIME: Stabilized-LIME for Model Explanation

https://doi.org/10.1145/3447548.3467274

Zhou, Zhengze; Hooker, Giles; Wang, Fei (August 2021, KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining)
null (Ed.)
An increasing number of machine learning models have been deployed in domains with high stakes such as finance and healthcare. Despite their superior performances, many models are black boxes in nature which are hard to explain. There are growing efforts for researchers to develop methods to interpret these black-box models. Post hoc explanations based on perturbations, such as LIME [39], are widely used approaches to interpret a machine learning model after it has been built. This class of methods has been shown to exhibit large instability, posing serious challenges to the effectiveness of the method itself and harming user trust. In this paper, we propose S-LIME, which utilizes a hypothesis testing framework based on central limit theorem for determining the number of perturbation points needed to guarantee stability of the resulting explanation. Experiments on both simulated and real world data sets are provided to demonstrate the effectiveness of our method.
more » « less
Full Text Available
Unbiased Measurement of Feature Importance in Tree-Based Methods

https://doi.org/10.1145/3429445

Zhou, Zhengze; Hooker, Giles (April 2021, ACM Transactions on Knowledge Discovery from Data)
null (Ed.)
We propose a modification that corrects for split-improvement variable importance measures in Random Forests and other tree-based methods. These methods have been shown to be biased towards increasing the importance of features with more potential splits. We show that by appropriately incorporating split-improvement as measured on out of sample data, this bias can be corrected yielding better summaries and screening tools.
more » « less
Full Text Available
Unbiased Measurement of Feature Importance in Tree-Based Methods

https://doi.org/3429445

Zhou, Zhengze; Hooker, Giles (January 2021, ACM transactions on knowledge discovery from data)
null (Ed.)
We propose a modification that corrects for split-improvement variable importance measures in Random Forests and other tree-based methods. These methods have been shown to be biased towards increasing the importance of features with more potential splits. We show that by appropriately incorporating split-improvement as measured on out of sample data, this bias can be corrected yielding better summaries and screening tools.
more » « less
Full Text Available

Search for: All records